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ABSTRACT 


This  report  summarizes  the  research  conducted  at  the  Center  for 
Reliable  Computing  with  the  approval  of  the  Air  Force  Office  of 
Scientific  Research  under  Contract  No.  F49620-79-C-0069  for  the  period, 
1  May  1979  to  31  October  1980.  Major  results  and  current  work,  in 
various  aspects  of  computer  system  reliability  evaluation  and  design, 
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1  IKTRQPWCTIQH 

This  scientific  report  describes  the  research  activities  at  the 
Center  for  Reliable  Computing  (CRC)f  S-»nford  University  Computer 
Systems  Laboratory,  during  the  period  May  1,  1979  to  October  31 t  1980. 

The  principal  research  results  described  are: 

1.  Statistical  study  of  system  utilization  and  failures  at 
Stanford  Linear  Accelerator  Center  (SLAC)  Computer  Facility. 

2.  A  statistical  approach  towards  modeling  uncertainty  in 
system  reliability  due  to  uncertainty  in  failure  rate  estimation.' 

3.  Consistency  checking  for  generating  reliable  designs. 

4.  Testability  considerations  in  digital  system  design. 

In  section  2  we  summarize  the  important  results  in  each  of  the 
above  problem  areas. 


2  RESEARCH  DESCRIPTION 

2.1  Computer  Reliability  and  Effect  on  Utilization 

A  broad-based  survey  of  techniques  for  building  reliable 
computing  systems  was  presented  in  [McCluskey,  19803 .  Various  methods 
for  providing  specific  types  of  fault  tolerance  were  discussed.  The 
types  of  malfunctions  in  a  computer  system  and  the  possible  responses 
to  these  malfunctions  were  described.  A  critical  review  of  techniques 
for  obtaining  these  responses  was  also  made. 

[Miller,  19793  presented  a  taxonomy  of  fault-tolerant 


techniques.  The  paper  placed  the  many  classes  of  fault- tolerant 
techniques  in  a  hierarchy  ordered  by  technical  characteristics.  Such 
an  approach  provides  a  basis  for  presenting  and  comparing  the 
techniques  in  a  logical  manner. 

Major  effort  was  concentrated  on  the  study  of  failures  and 
system  load.  Two  large  computer  complexes  at  Stanford  University  have 
reliability  data  available  for  study.  Fortunately,  SLAC  (the  Stanford 
Linear  Accelerator  Center)  and  CIT  (the  Center  for  Information 
Technology)  are  functionally  similar  and  are  composed  of  similar 
equipment.  This  makes  direct  comparison  of  study  results  possible. 
The  physical  system  organizations,  both  component  interconnection  and 
component  redundancy,  are  quite  different.  The  workload  and  levels  of 
utilization  vary  considerably  between  the  two  installations. 

The  availability  of  both  human-collected  and  machine-recorded 
failure  data,  along  with  corresponding  load/performance  data  provides  a 
unique  opportunity  to  study  the  effect  of  utilization  levels  on 
component  and  system  failures. 

Initial  work  was  built  upon  prior  research  at  CRC  (Beaudry, 
1979]  which  was  concerned  with  the  relationships  between  time-of-day 
and  various  types  of  failures  at  the  SLAC  triplex.  The  time-of-day 
aspect  of  failure  modeling  aroused  interest  in  the  overall  profile  of 
system  load: 

Prior  to  an  investigation  of  load,  an  independent  analysis  of 
the  failure  data  was  performed.  Gross  measures  of  MTBF,  MTTR,  and 
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system  availability  were  computed  for  important  categories  of  failure. 
In  particular,  hardware,  software,  operator-induced,  and 
utility/ facility  failures  were  separately  analyzed  both  for  component- 
only  and  system  outages.  The  results  were  reported  in  [Butner , ~1980a] . 

A  preliminary  search  for  available  load/performance  data 
indicated  an  enormous  quantity  of  SMF  (IBM  System  Management  Facility) 
raw  data.  An  entire  high-density  2400-foot  magnetic  tape  contains 
approximately  24  days  of  these  highly  detailed  SMF  records.  By 
contrast,  the  total  SLAC  failures  amount  to  only  5-6  per  day  (1500-2000 
per  year).  Thus,  an  early  challenge  was  to  meaningfully  reduce  the 
voluminous  performance  data  in  order  to  allow  direct  manipulation  and 
comparison  with  a  year  of  failure  data. 

The  performance  data  is  collected  automatically  by  the  IEM 
system  software.  There  are  approximately  50  different  types  of  SMF 
data.  The  data  contain  information  on  the  initiation,  processing,  and 
termination  of  jobs,  on  batch  streams,  on  interactive  user  sessions, 
and  on  other  important  events.  Initially,  the  "job  step"  record  was 
selected  for  processing.  This  record  corresponds  one-to-one  to  an 
executed  batch  user  job  step.  The  record  includes  CPU  time,  step- 
elapsed  time,  1/0  counts  by  device,  paging,  and  other  performance  and 
accounting  data.  From  this  record  four  data  elements  were  chosen  for 
study: 

o  PAGING  -  the  sum  of  page-ins  and  page-outs  for  the  step. 

o  EXCPS  -  the  sum  of  all  1/0  initiations  for  the  step. 

o  CPU  -  the  central  processor  time  used  for  the  step. 
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o  HOUR  -  the  hour  of  the  day  during  which  the  Job  started. 

Job  steps  which  were  never  executed  and  those  corresponding  to 
continuously-running  system  jobs  (e.g.  WYLBUR)  were  discarded. 

From  the  reduced  data,  four  "virtual  day"  load  profiles  were 
formed.  The  four  load  measures  were  job  steps  executed  per  hour, 
paging  rate  per  hour,  user  CPU  time  per  hour,  and  I/O  starts  per  hour. 
The  virtual  day  profiles  depicted  the  average  value  of  each  load 
measure  for  each  hour  of  the  day.  The  profiles  were  statistically 
compared  with  average  failure  rates  by  hour.  Final  results  of 
regression  and  analysis  of  variance  were  presented  at  FTCS-10  [Butner, 
1980b]. 

Other  reliability  work  was  devoted  to  a  very  important  (though 
much  neglected)  practical  problem  in  reliability  prediction.  This  is 
the  study  of  the  effect  of  uncertainty  in  failure  rate  estimation  on 
system  reliability.  The  problem  is  particularly  acute  in  ultra- 
reliable  systems  where,  failures  are  low  and  hence,  the  uncertainty  in 
estimation  is  high.  Two  approaches  to  modeling  this  phenomenon  were 
developed:  the  first  exact  and  the  second  approximate.  The  usefulness 
of  such  models  was  illustrated  using  real,  manufacturer  provided,  data 
on  system  failures.  The  results  were  presented  at  FTCS-10  [Iyer, 
1980]. 


2.2  Consistency  Checking 

A  crucial  requirement  in  the  design  of  high  reliability  multi- 
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processor  systems,  was  the  maintenance  of  consistency  among  processing 
units.  In  particular,  consistency  concerning  their  concept  of  time 
(affecting  synchronization),  their  output  (affecting  reliable 
performance) ,  and  their  concept  of  the  integrity  of  the  whole  system 
(affecting  reliable  reconfiguration). 

Some  recent  systems  have  chosen  to  solve  the  consistency 
problem  in  software  algorithms,  (as  in  Pluribus  and  SIFT),  In  order  to 
allow  themselves  more  flexibility  and  lower  hardware  costs.  However, 
several  advantages  are  offered  by  hardware  implementations,  such  as 
efficiency  and  greater  testability. 

The  design  of  the  Consistency  Unit  (CU)  [Fu,  1980a]  was  a 
demonstration  of  the  concept  of  implementing  a  fault-tolerant  algorithm 
in  hardware  using  Very  Large-Scale  Integrated  (VLSI)  circuit 
techniques.  This  unit  acts  as  an  intelligent  inter-processor  bus 
interface  in  a  four-processor  system  in  such  a  way  that  any  failure  in 
a  single  processor  and  its  associated  bus  cannot  affect  the  consistency 
of  the  data  exchanged  among  the  remaining  processors. 

The  actual  design  of  the  integrated  circuit  was  carried  out, 
implementing  a  CU  for  4-bit  words  in  an  NMOS  chip  of  about  100  mil 
square.  This  integrated  circuit  is  also  fully  testable,  due  mainly  to 
the  structure  of  its  design.  The  contents  of  all  of  its  registers  can 
be  observed'and  the  combinational  part  of  the  circuit  is  also  directly 
testable. 

An  extended  design  of  the  CU  is  the  Communication  Interface 
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(Cl)  [Fu,  1980b].  The  communication  structure  is  regarded  as  being  most 
critical  in  a  fault-tolerant,  multi-computer  computer  environment.  The 
Cl  takes  into  account  some  practical  problems  in  ultra-reliable 
systems.  Specifically,  the  aspects  of  fault  detection  and  concurrently 
testable  hardware  are  addressed.  In  addition,  the  Cl  reports  a 
Consistent  Communication  Matrix  (CCM)  of  values  for  its  associated  - 
processor  to  evaluate  the  integrity  of  the  communication  system  as  a 
whole. 

The  concept  of  implementing  these  fault-tolerant  algorithms  in 
hardware  is  not  restricted  by  the  designs  of  particular  circuits. 
Applications  could  be  made  in  new  computer  architectures;  one  candidate 
is  the  data-flow  multiprocessor,  which  shares  some  architectural 
similarities  with  the  Cl. 

Thus,  the  Cl  provides  the  means  to  generate  a  fault-tolerant 
multiprocessor  system  in  which  consistency  among  processors  for 
critical  functions  are  guaranteed.  The  merit  of  consistency  is  the 
independence  of  any  assumptions  on  the  possible  faults  that  may  occur, 
as  long  as  they  exist  in  a  limited  number  of  processor  modules. 

2.3  Testability  Considerations  in  Design 

With  the  advent  of  VLSI  technology,  testability  considerations 
are  assuming  an  ever  increasing  role  in  circuit  design.  [McCluskey, 
1979b]  presented  a  survey  of  techniques  for  testing  of  digital  systems 
as  well  as  methods  for  the  design  of  easily  testable  systems;  an 


6 


4 


o* 


extensive  bibliography  was  also  included.  In  addition  [Hayes,  1979; 
1980],  discussed  testability  considerations  in  microprocessor-based 
design.  General  issues  relating  to  testability,  testing  methods  and 
fault  modeling  were  presented.  In  addition,  specific  techniques  for 
testable  design  of  micro-processor  based  systems  were  also  discussed. 

A  new  technique  for  designing  easily  testable  sequential  * 
machines  with  an  arbitrary  number  of  inputs  was  proposed  in  [Pradhan, 
1980].  The  design  was  shown  to  be  optimal  with  respect  to  the  length 
of  transfer  and  distinguishing  sequences.  An  efficient  checking 
sequence  for  fault  detection  for  the  proposed  design  was  also 
presented . 
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1.  Ninth  Annual  Fault-Tolerant  Computing  Symposium  (FTCS-9),  Madison, 

Wisconsin,  June  20-22,  1979,  attended  by  E.J.  McCluskey  and  Behzad 

Khodadad . 

2.  AIAA  Computers  in  Aerospace  Conference  II,  Los  Angeles,  California, 
October  22-2 4,  1979,  attended  by  E.J.  McCluskey. 

3.  Asilomar  Conference  on  Circuits,  Systems  and  Computers,  Pacific 
Grcve,  California,  November  5-7,  1979,  attended  by  E.J.  McCluskey. 

4.  C0MPC0N  Spring  80 ,  San  Francisco,  California,  February  25-28*  i960, 
attended  by  E.J.  McCluskey. 

5.  Workshop  on  Fault  Tolerant  VLSI  Design,  Santa  Monica,  California, 
April  23-25,  1980,  attended  by  E.J.  McCluskey. 

6.  IEEE  Workshop  on  Design  for  Testability,  Boulder,  Colorado,  April 
16-17,  1980,  attended  by  E.J.  McCluskey. 

7.  Computer  Elements  Committee  Workshop,  Vail,  Colorado,  June  22-25, 
1980,  attended  by  E.J.  McCluskey. 

8.  10th  Annual  Fault-Tolerant  Computing  Symposium,  Kyoto,  Japan, 
October  1-3",  1930,  attended  by  E.J.  McCluskey,  P.L.  Fu,  R.K.  Iyer  and 
D.J.  Lu. 
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Fault-Tolerant  Computing  (FTCS-3),  June  1973 

General  Chairman,  Third  Symposium  on  Operating  Systems,  1971 

SEMINARS  AND  INVITED  LECTURES  AND  PAPERS  (selected): 

"Testing  and  Diagnosis  of  Logic,"  Invited  Paper,  Euro/lFIP  79» 
P.A.  Samet,  Editor,  London,  England,  September  25-28,  1979 

’’Logic  Design  for  Multi-Level  Integrated  Circuits,”  Distinguished 
Lecturer  Series,  Computer  Science  Dept.,  Carnegie-Mellon 
University,  Pittsburgh,  Pennsylvania,  March  8,  1978 

OTHER  PROFESSIONAL  ACTIVITIES: 

The  Annals  of  the  History  of  Computing,  Editorial  Board,  AFIP3, 
1978- 

Design  Automation  and  Fault-Tolerant  Computing,  Editorial  Board, 
1977- 

Digital  Processes,  Editorial  Board,  Delta  Publishing  Company, 
Ltd.,  Switzerland,  1975- 

Computer  Design  and  Architecture  Series,  Elsevier  North-Holland , 
Inc.  (formerly  American  Elsevier),  Dew  York,  1973- 

Visiting  Committee  for  Information  and  Computer  Science,  Georgia 
Institute  for  Technology,  1978 

Patentee  with  T.T.  Dao  and  L.K.  Russell,  "Multivalued  Integrated 
Injection  Logic  Circuitry  and  Method,”  Do. 4, 140,920,  February  20, 
1979 

National  Academy  of  Science,  Planning  Group  for  Education, 
Computer  Science  and  Engineering  Board,  1968-1973 

Commission  on  Engineering  Education  COSIDE  Committee,  1965-19/2 
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Books 


SELECTED  PUBLICATIONS 


INTRODUCTION  to  the  THEORY  of  SWITCHING  CIRCUITS,  McGraw-Hill 
Book  Co.,  New  York,  New  York,  1 965 . 

DESIGN  of  DIGITAL  COMPUTERS,  with  H.W.,  Gschwind,  Springer- 
Verlag,  New  York,  New  York,  1975* 

" Logic  Design,*’  ENCYCLOPEDIA  OF  COMPUTER  SCIENCE,  2ND  EDITION, 

A.  Ralston,  Editor,  Petrocelli/Charter ,  New  York,  New  York,  1980. 

Journal  Papers 

"Minimization  of  Boolean  Functions, *’  B.S.T.J.,  Vol.  35,  No.  6, 
pp.  14.17-I444,  November  1956. 

*’ Iterative  Combinational  Switching  Networks  -  General  Design 
Considerations,”  IRE  Trans,  on  Electronic  Computers,  Vol.  EC-7, 
No.  4,  pp.  285-291,  December  1958. 

’’Error-Correcting  Codes  -  A  Linear  Programming  Approach," 

B. S.T.J.,  Vol.  38,  No.  6,  pp.  1485-1 512,  November  1959- 

(With  S.H.  Unger)  ”A  Note  on  the  Number  of  Internal  Variable 
Assignments  for  Sequential  Switching  Circuits,”  IRE  Trans,  on 
Electronic  Computers,  Vol.  EC-3,  No.  4,  pp.  439-440,  December 
1959- 


(With  M.C.  Paul)  "Boolean  Functions  Realizable  with  Single 
Threshold  Devices,”  Proc.,  IRE,  Vol.  43,  No.  7,  pp.  1535-1337, 
July  I960. 


(With  F.W.  Clegg)  "Fault 
Networks,  IEEE  Trans,  on 
1286-1293,  November  1971  * 

Equivalence 

Computers, 

in 

Vol. 

Combinational 
C-20,  Ko.  11, 

Logic 

pp. 

(With  D.P.  Siewiorek)  ’’An 

Iterative 

Cell 

Switch  Design 

for 

Hybrid  Redundancy  IEEE  Trans,  on  Computers,  Vol  C-22,  No.  3, 
pp.  290-297,  March  1973- 

(With  K.P.  Parker)  ’’Analysis  of  Logic  Circuits  with  Faults  Using 
Input  Signal  Probabilities,”  IEEE  Trans,  on  Computers,  Vol. 

C-2 4,  No.  5,  PP-  573-578,  May  1975- 

(With  J.J.  Shedletsky)  "The  Error  Latency  of  a  Fault  in  a 
Sequential  Digital  Circuit,”  IEEE  Trans,  on  Computers,  Vol. 

C-25,  No.  6,  pp.  665-659,  June  1976. 

•  ’’Logic  Design  of  Multi-Valued  IIL  Logic  Circuits,"  IEEE  Trans, 
on*  Computers,  Vol.  C-28,  No.  8,  pp.  546-559,  August  1979* 

(With  J.P.  Hayes)  "Testability  Considerations  in  Microprocessor- 
Based  Design,”  Computer  Magazine,  pp. 17-26,  March  1980. 


conference  Papers 


(With  A.  Grasselli)  "Une  Version  Modifies  D’Algol  Pour  La 
Programnation  Logique  (l),"  Proc.,  2  eme  Congress  de 
1* Association  Francaise  de  Calcul  et  de  Traitement  de 
1* Information,  Paris,  France,  October  7-20,  1961. 

"Transients  in  Combinational  Logic  Circuits,"  REDUNDANCY 
TECHNIQUES  for  COMPUTING  SYSTEMS,  pp.  9-46,  R.H.  Wilcox  and  V.C. 
Mann,  Editors,  Spartan  Books,  Washington,  D.C.,  1962. 

"Fundamental  Mode  and  Pulse  Mode  Sequential  Circuits,"  Proc.f 
2nd  Int'l  Federation  on  Information  Processing  Congress,  pp. 
725-730,  Munich,  West  Germany,  August  27-  September  1,  1962 
(North-Holland  Publishing  Company,  Amsterdam,  Netherlands). 

"Logical  Design  Theory  of  NOR  Gate  Networks  with  No  Complemented 
Inputs,"  Proc.,  4th  Annual  Symposium  on  Switching  Circuit  Theory 
and  Logical  Design,  S-156,  pp.  137-148,  IEEE,  Chicago,  Illinois, 
September  1963* 

(With  J.F.  Wakerly)  "Design  of  Low-Cost  General-Purpose  Self- 
Diagnosing  Computers,"  Proc.,  IFIP  Congress  *74,  pp.  108-111, 
Stockholm,  Sweden,  August  3-10,  1974. 

(With  T.T.  Dao,  L.K.  Russell  and  D.R.  Preedy)  "Multilevel  IIL 
with  Threshold  Gates,"  Proc.,  IEEE  Int’l  Solid-State  Circuits 
Conference,  Philadelphia ,  Pennsylvania,  February  16-18,  1977. 

(With  J.F.  Wakerly)  "Microcomputers  in  the  Computer  Engineering 
Curriculum,"  Microprocessors-2 ,  Invited  Papers,  Infotech  Int’l 
■Ltd.,  Berkshire,  England,  1977. 

(With  S.  Bozorgui-Nesbat)  "Design  for  Autonomous  Test,"  PROC. 
1930  TEST  CONFERENCE,  Philadelphia,  Pennsylvania,  November 

Reports 

(with  T.G.  Belden,  R.  Bosak,  W.L.  Chadwell,  L.S.  Christie,  J.P. 
Haverty,  R.H.  Scherer  and  W.S.  Torgerson)  "Computers  in  Command 
and  Control."  Tech.  Rpt.  No.  61-62,  Institute  for  Defense 
Analyses,  Arlington,  Virginia,  November  1961. 

(With  J.B.  Dennis,  D.C.  Evans,  W.H.  Hudgins,  M.  Karnaugh,  J.F. 
Kaiser,.  F.F.  Kuo,  S.  Seely,  VI. H.  Surber,  K.E.  Van  Valkenburg  and 
L.A.  Zakeh)  "Computer  Science  in  Electrical  Engineering," 
Cosine  Committee  on  Engineering  Education,  September  1967. 

( Vi  th  W.F.  Atchison,  S.D.  Conte,  J.V.  Hamblen,  T.E.  Hull,  T.A. 
Keenan,  W.S.  Kehl ,  S.O.  Navarro,  W.C.  Rheinboldt,  E.J.  Schweppe, 
W.  .  Vio/ant,  and  D.K.  Young)  "Curriculum  68,"  Communications  of 
the  ACM  Vol .  II,  No.  3.  pp.  151-197,  March  1968. 
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Stanford,  CA  94305 
(415)  497-1448 


HOME  ADDRESS:  2708  Greer  Road 

Palo  Alto,  CA  94303 
(415)  856-1861 


BIOGRAPHY 

BORN:  Washington,  D.C.,  18  October  1947 

EDUCATION : 

B.S.  M.I.T. 

K.S.  Stanford 

H.S.  Stanford 

Ph.D.  Stanford 


EDUCATIONAL  HONORS: 

National  Merit  Scholarship  1965-1969 

National  Honor  Society  Scholarship  (declined)  1965 

IBM  Graduate  Fellowship  1 974-1 976 

PROFESSIONAL  ACTIVITIES: 

Member  cf  AAA3,  ACM,  IEEE  Computer  Society,  SWE,  WISE ,  Sigma  Xi 
Treasurer  of  Santa  Clara  Group  of  IEEE  Computer  Society  (1977-1978) 
Publicity  Chairwoman  for  Fifth  Annual  Symposium  on  Computer 
Architecture,  Palo  Alto,  3-5  April  1978 
Session  Chairwoman  for  3rd  USA-Japan  Computer  Conference,  San  Francisco, 
10-12  October  1978 

PROFESSIONAL  EXPERIENCE: 

Lecturer  in  Computer  Science  (April-June  1979) 

Computer  Science  Department,  Stanford  University 

Computer  Science  105,  Introduction  to  Computing.  Undergraduate 
course  using  the  programming  language  PASCAL. 

Research  Associate  (April  1978  -  present) 

Center  for  Reliable  Computing,  Computer  Systems  Laboratory 
Stanford  University 

Research  on  performance  and  reliability  evaluation  of  computing 
systems.  Supervisor:  Prof.  Edward  J.  KcCluskey. 
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Teaching  assistant  (October -December  19T?) 

Electrical  Engineering  Department,  Stanford  University 

Computer  Science  311,  (also  Electrical  Engineering  482),  Advanced 
Computer  Organization.  Graduate  course  in  computer  architeture. 

Research  assistant  (July  1976  -  September  1977  and  January-April  1978) 

Digital  Systems  Laboratory,  Stanford  University 

Research  on  the  performance  and  reliability  of  gracefully  degrading 
computing  systems.  Supervisor:  Prof.  Edward  J.  McCluskey. 

IBM  Graduate  fellow  (October  1974  -  June  1976) 

Digital  Systems  Laboratory,  Stanford  University 

Research  on  dual  redundant  and  gracefully  degrading  computer 
systems.  Supervisor:  Prof.  Edward  J.  McCluskey. 

Research  assistant  (April  1974  -  September  1974) 

Digital  Systems  Laboratory,  Stanford  University 

Research  on  dual  redundant  computer  systems. 

Supervisor:  Prof.  Edward  J.  McCluskey. 

Systems  engineer  (April  1973  -  March  1974) 
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interfaces  for  Fourier  Analyzer  computer  systems. 
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Fairchild  Semiconductor,  Mountain  View,  California 

Programming  in  computer-aided  design  of  integrated  circuits. 


CONSULTING: 


Notional  Semiconductor,  1978 
Hughes,  1978- 

Technology  Development  Corporation,  1979 
PUBLICATIONS: 

"A  Markov  model  for  reconf iguroble  computer  systems,”  (with  E.  Fregni) , 
Tech.  Note  No.  43,  Digital  Systems  Lab.,  Stanford  Univ.,  1974. 

"Dual  redundancy  —  a  survey,"  Tech.  Note  No.  93,  Digital  Systems  Lab., 
Stanford  Univ.,  April  .1977. 


"Performance  related  reliability  measures  for  computing  systems," 
Tech.  Note  No.  101 ,  Digital  Systems  Lab.,  1976. 


©  © 

"Performance-related  reliability  measures  for  computing  systems,"  Proc. 
FTCS-7 ,  7th  Annual  International  Conference  on  Fault-Tolerant 
Computing,  Dp.  16-21,  June  1977. 

"Performance  considerations  for  the  reliability  analysis  of  computing 
systems,"  Ph.D.  Dissertation,  Stanford  Univ,,  April  1978. 

"Performance  considerations  for  reliability  analysis:  A  statistical 
case  study,"  Tech.  Note.  No.  126,  Digital  Systems  Lab.,  Stanford 
Univ.,  1978. 

"A  statistical  analysis  of  service  interruptions  at  the  SLAC  Triplex 
multiprocessor,"  Tech.  Rpt.  No.  141,  Digital  Systems  Lab.,  1976. 

"Performance  considerations  for  reliability  analysis:  A  statistical 
case  study,"  Proc.  FTCS-8,  Eighth  Annual  International  Conference 
on  Fault-Tolerant  Computing,  Toulouse,  France,  p.  198,  June  1978. 

"Performance-related  reliability  measures  for  computing  systems,"  IEEE 
Trans,  on  Computers,  Special  Issue  on  Fault-Tolerant  Computing, 
June  1978,  pp.  540-547. 

"A  statistical  analysis  of  failures  in  the  SLAC  computing  center," 
Digest  of  Papers,  Spring  COMPCON  79,  San  Francisco,  CA, 
pp.  49-52,  26  February  -  1  March  1979. 

"Stochastic  behavior  of  failures  in  computing  systems,"  Tech.  Note, 
Computer  Systems  Lab.,  (in  preparation). 
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1.  FULL  NAME 


RAV1SHANKAR  KRISHNAN  IYER 


2.  ADDRESS  CENTER  FOR  RELIABLE  COMPUTING 

COMPUTER  SYSTEMS  LABORATORY 
Departments  of  Elc  -rical 
Engineering  and  Comouter  Science 
Stanford  University 
Stanford  CA  94305 
Ph.  415-497-1448;  415-323-9112. 


3.  DATE  AND  PLACE  OF  BIRTH 

4 .  NATIONALITY 

5.  MARITAL  STATUS 


4  December  1949*  New  Delhi  India. 

Australian 

Single 


6.  DETAILS  OF  EDUCATION  CAREER 


(i)  B.E.  (Electrical)  Electronics 
Communication ,  1 973 . 

University  of  Queensland 
Brisbane,  Australia. 

(ii)  M.Eng.ScCQual .)  1974. 

(iii)  Ph .D  1977,  University  of 
Queensland , Austral ia . 


7.  PRIZES  AND  AWARDS 


(i)  Prize  and  plaque  for  a 
research  paper  (graduate 
category)  IEEE  Student  paper 
contest  1977. 

(ii)  Erst  Australian  student 

paper,  IEEE  (Australian  section) 
prize,  1977. 

(iii)  Royal  Norwegian  Council  for 
Scientific  and  Industrial 
Research  Fellov/ship  1977. 

(iv)  CSIRO  (Commonwealth  Scientific 
and  Industrial  Research 
Organization,  Australia) 
Postdoctoral  scholarship 

(for  young  scientists)  1978. 

(v)  IBM  World  Trade  Visiting 
Scientist,  T.  J .  W'&tson 
Center,  1980. 
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£.  ACADEMIC  AND  PROFESSION^ 
EXPERIENCE 


(a)  Tutor  in  Electls&al  Engineering 

University  of  Queensland  1973-1977. 

(b)  Norwegian  Institute  of  Technology 
University  of  Trondhiem,  Norway  1977-1979. 

(c)  Computer  Systems  Laboratory 

Stanford  University  1979  -  4 

(on  CSIRO  Fellowship). 

(d)  Visiting  Lectures  and  Seminars 

University  of  Stuttgart,  W.  Germany. 

Danish  Technical  University,  Denmark 
University  of  Lund,  L.  M.  Ericsson,  Sweden 

(e)  Referee,  reliability  journals 

and  conferences. 


S.  PROFESIONAL  AFILIATION  Member  IEEE 

10.  RESEARCH  WORK 

During  the  past  6  years  or  so  I  have  been  engaged  in  developing 
probabilistic  models  for  studying  system  performance  degradation.  * 

Particular  attention  has  been  paid  to  modelling  computer  systems;  two 
major  classes  of  problems  have  been  studied: 

A)  Reliability  Analysis  and  Design  (catastrophic  failure,  fault 
tolerance,  transient  errors) . 

B)  Performance  Evaluation  (including  overflow  streams,  feedback  and  priority 
queues) . 

In  Australia  sections  of  this  work  were  supported  by  Telecom  Australia, 

The  Radio  Research  Board,  The  Electrical  Research  Board  and  The  Queensland 
Electrical  Authorities  Research  Committee. 

In  Norway  the  research  v;as  supported  by  The  Norwegian  Telecommunications 
Administration  Research  Establishment . 

Brief  descriptions  of  the  problems  investigated  in  each  of  the  above 
areas  appears  below: 

A.  (1)  Reliability  Modelling,  Fault  Tolerance. 

a)  Optimal  reliability  design  of  series-parallel  systems  [2]. 

b)  Evaluation  and  optimization  of  the  reliability  of  complex 
structures  pi)  [ 5] • 
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c)  Fault  tol e' 


t  configurations  [15j. 


d)  Study  of  the  effect  of  uncertainty  in  failure  rate  prediction 

on  the  reliability  and  performance  of  fault  tolerant  sy terns  [18]* 

(2)  Transient  Error  Analysis 

a)  An  analytical  model  for  the  transient  error  generation 
process  [14]. 

b)  Optimal  allocation  of  check-points  and  rollback  intervals  [14]. 

(3)  Prediction  cf  Software  Reliability 

a)  Development  of  a  finite  sampling  model. 

b)  A  Bayesian  approach  to  failure  rate  estimation. 

B.  (1)  Performance  Studies. 

a)  Statistical  analysis  of  system  load  and  failures  at  the 
Stanford  Linear  accelerator  Computer  multiprocessor)  [18]. 

b)  Analysis  of  system  behaviour  due  to  variations  in  system 
component  qualities  (grades  or  tolerances)  [1],  [6]. 

c)  Intermittent  failures;  effect  on  performance:  Development 
of  a  shock  model . 

(2)  Approximate  Techniques  for  tne  Analysis  of  Feedback  and  Priority 
Queues  in  Computer  Systems. 

A  job/task  graph  formulation  has  been  proposed  to  describe  te 
workload  on  a  system.  Three  approaches  to  generating  approximating 
solutions  are  being  studied  and  their  goodness  tested  by  simulation  [16] 

a)  Piece-v/ise  models 

b)  A  modified  Taylor  approximation  model. 

c)  A  state  dependent  level  crossing  formulation. 

(3)  Study  of  Overflow  Streams  [10]. 

(4)  A  Kul ti server  Model  for  a  Metropolitan  Telephone  Ketv;ork  [8]. 
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_IST  OF  PUBLICATIONS  .  - 

[1]  (coauthor  T. Downs),  "A  Mote  on  the  Coaputaton  of  Large-Change 
Multiparameter  sensitivities,"  International  Jnl .  of  Circuit 
Theory  and  Applications,  vol.  4,  pp.  307-310. 

[2]  (coauthor  T.  Downs),  "A  Variance  Minimization  Method  of  Reliability 
Design,”  IEEE  Trans,  on  Reliability,  vol.R-26,  pp.  106-110,  June 
1977. 

[3]  "Some  Applications  of  Variance  Minimization  in  Electrical 
Engineering,"  IEEE  Student  Paper  Award  1977. 

[41  (coauthor  T.  Downs),  "Moment  Approach  to  the  Evaluation  and  Cptilzation 
of  Complex  System  Reliability,"  IEEE  Trans  on  Reliability,  vol.  R-25, 
pp.  226-229,  Aug.  1978. 

[5]  "Approximations  to  Moments  of  Parallel  System  Lifetime  based  on  Sampling 
from  a  Finite  Population,"  IEEE  Trans,  on  Reliability,  vol.  R-28, 

pp.  225-229,  June  1979- 

[6]  (coauthor  T.  Downs),  "Computation  of  Finite  Change  Multipararaater 
Sensitivities  with  Application  to  Automatic  Design,"  15th.  IREE 
International  Convention,  Sydney  1975,  Convention  Digest  p.  52-54. 

[?]  "Some  Applications  of  Probability  Theory  to  the  Solution  of  Eletrical 
Engineering  Problems,"  Aust.  Applied  Matnraatics  Conference 
Terrigal,  February  1977. 

[8]  (coauthor  T.  Downs),  "A  Probabilistic  Model  for  Optimization  of 
Telephone  Networks,"  Proc.  Eighth  International  Telctrafic  Congress, 
Melbourne  1976.  Reprinted  by  invitation,  Australian  Telecommunations 
Research,  vol.  11,  1977. 

[9]  (coauthor  T.  Downs),  "Applications  of  Variance  Minimization  in 
the  Design  of  Electronic  Systems  and  Communication  Networks," 

IREECGN  International  (16th  IREE  International  Convention),  Melbourne  1977, 
Convention  Digest  pp.  175-177. 

[10]  "A  Lew  Variance  Formula  for  the  Evaluation  of  Overflow  Traffic  in 
Teletraffic  Networks,"  Tech.  Mott;  1/77,  University  of  Queensland. 

[1']  (coauthors  ?.  K.  W.  Chan,  T.  Downs),  "Optimum  Maintenance  Scheduling 

for  Communication  Systems,"  IREECOf!  International  (iGth  TREE  International 
Convention;  Melbourne  1977,  Convention  Digest  pp.  178-180. 

[12]  (coauthors  T.  Downs,  f! .  G.  Lovely),  "Investigation  into  the  Assignment 
of  Tolerances  to  Minimum  Sensitivity  Networks,"  Report  Nos.  1,  2,3,4  to 
Tciec.om  Australia  (Contract  No.  CO  47840),  Dec.  1975  -  Aug.  1977. 
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[13]  (coauthor  T.  Downs),  "A  Variance  Minimization  Approach  to  the  Choice  of 
Component  Grades  in  Linear  Systems,”  1978  European  Conf.  Circuit  Theory 
and  Design,  Lausanne  Switzerland. 

[14]  (coautnor  P.  J.  Emstad),  "Transient  Error  Analysis  and  Check-Point 
Placement  in  Computer  Systems, 11  Tech.  Report,  Norwegian  Instiute 
Technology,  Trondheim  Norway. 

[15]  "On  the  Employment  of  Variance  for  the  Reliability  Modelling  of  Fault 
Tolerant  Systems,"  Proc.  FTCS-9,  Madison  Wisconsin , June  1979. 

[16]  (coauthor  P.  J.  Emstad) ,  "An  Approximate  Method  for  the  Solution  of 
M/G/1  Queue  with  Feedback,"  Tech.  Report,  Norwegian  Institute  of 
Technology  Trondhiem,  Norway. 

[17]  (coauthor  T.  Downs),  "A  Variance  Minimization  Approach  to  Tolerance 
Design,"  to  appear  in  IEEE  Trans.  CAS.  1980. 

[18]  (coauthor  S.  E.  Butner) ,  "A  Statistical  Study  of  Reliability  and 
System  Load  at  SLAC,"  Submitted  for  presentation  FTCS-10,  Japan. 

[19]  "A  Study  of  the  effect  of  Uncertainty  in  Failure  Rate  Prediction 
on  System  Reliability,"  Submitted  for  presentation  FCS-10,  Japan. 
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